policy gradient

[ˈpɒləsi ˈɡreɪdiənt]

基础释义

纠错

网络
策略梯度

更新时间：2026-04-19 15:07:34

双语例句

更多例句

1、

The fuzzy policy gradient reinforcement learning, which incorporates a priori knowledge by using fuzzy inference systems, has been studied in this dissertation.

研究了利用模糊推理系统引入先验知识的策略梯度增强学习算法。
来源：互联网摘选
2、

During the research on the theoretical framework of policy gradient reinforcement learning, it is proved that the gradient estimation formulas of all the existing policy gradient algorithms can be uniformed.

本文的创新点和研究成果主要包括：1、在策略梯度增强学习理论框架的研究中，证明了现有策略梯度增强学习算法的梯度估计公式都符合统一的形式。
来源：互联网摘选
3、

Then by utilizing the features of this model an online optimization algorithm that combines policy gradient estimation and stochastic approximation is derived.

利用此模型的动态结构特性，结合在线学习估计梯度与随机逼近改进策略，提出动态电源管理策略的在线优化算法。
来源：互联网摘选
4、

Two fuzzy policy gradient reinforcement learning algorithms are proposed for Markov Decision Processes with discrete and continous actions, respectively.

本文分别针对具有离散行为空间和连续行为空间的马氏决策问题，提出了两种模糊策略梯度增强学习方法（Fuzzy Policy Gradient：FPG）。
来源：互联网摘选
5、

On the other hand, the variance of policy gradient estimation in existing policy gradient algorithms is usually large, so the speed of convergence becomes very slow, which is a significant problem for policy gradient algorithms to be widely applied.

但是另一方面，由于在梯度估计过程中方差过大，使得策略梯度算法收敛速度很慢，成为策略梯度增强学习被广泛应用的一个障碍。
来源：互联网摘选
6、

For the problem of multi-wheel coordination in motion control of lunar rover, an adaptive control method based on hybrid policy gradient reinforcement learning has been proposed.

针对月球车运动控制中的多轮协调问题，提出了一种基于混合策略梯度增强学习的自适应控制方法。
来源：互联网摘选
7、

According to this framework, some current policy gradient algorithms are generalized. 2.

并且在上述理论框架的指导下，对现有的策略梯度算法进行了推广。
来源：互联网摘选
8、

To start with historical analysis, this paper demonstrates that the essence of the process of Chinese regional pocily is the increase of regional policy gradient.

本文从历史分析入手，用大量事实证明了中国区域政策演变的实质就是倾斜度的提高。
来源：互联网摘选
9、

A hybrid policy gradient reinforcement learning control method is proposed to solve this complex optimation control problem with difficulty in obtaining teacher signals and designing fuzzy rules.

针对这种导师信号难以获取、模糊规则难以制定的复杂优化控制问题，本文提出了一种基于混合式策略梯度增强学习PG-SVM的多轮协调控制方法。
来源：互联网摘选
10、

The experimental results show that the convergence speed of policy gradient algorithms can be increased greatly by reducing the variance. 3.

仿真实验结果表明，通过减小方差，算法能够有效地提高收敛速度。
来源：互联网摘选
11、

The Optimal Reward Baseline for Policy-Gradient Reinforcement Learning

策略梯度强化学习中的最优回报基线
来源：互联网摘选

点击展开全部例句

相关词组

相关阅读

逃离爸爸！朱莉和皮特的女儿希洛正式登报要求去除父姓
英语网 · 双语娱乐资讯
大学英语六级必背单词讲解(2)
英语网 · 四六级英语
《角斗士》万人迷男星Paul Mescal与甜歌女神 Gracie Abrams 浪漫约会
英语网 · 双语娱乐资讯
赞达亚与荷兰弟 Tom Holland 订婚啦！
英语网 · 双语娱乐资讯
高考英语阅读理解真题2（含答案解析）
英语网 · 高考英语
七年级上册英语阅读理解专项训练15（含答案解析）
英语网 · 中考英语

policy gradient

逃离爸爸！朱莉和皮特的女儿希洛正式登报要求去除父姓

大学英语六级必背单词讲解(2)

《角斗士》万人迷男星Paul Mescal与甜歌女神 Gracie Abrams 浪漫约会

赞达亚与荷兰弟 Tom Holland 订婚啦！

高考英语阅读理解真题2（含答案解析）

七年级上册英语阅读理解专项训练15（含答案解析）